Pemrograman Prosesor Paralel Besar: Pendekatan Praktis: Di Luar Larik Linier: Skala ke Data Multidimensi

Selamat datang di Perpindahan Besar. Dalam pemrograman CPU, kita menentukan bagaimana untuk melakukan iterasi; dalam GPGPU, kita menentukan apa iterasi terlihat seperti apa. Perubahan dari logika berbasis instruksi menjadi logika berbasis data didukung oleh Abstraksi Kernel.

1. Rancangan global

Dengan menggunakan __global__ kualifikasi, Anda tidak sedang menulis fungsi—Anda sedang merancang rancangan yang dapat diskalakan. Eksekusi kernel tunggal mewakili satu unit kerja mandiri, memungkinkan GPU mengatur ribuan tugas identik melintasi jumlah inti yang sangat besar tanpa manajemen thread secara manual.

2. Resolver Alamat Global

Bagaimana satu thread di antara jutaan menemukan targetnya? Ia menggunakan kontrak pasti yang dikenal sebagai rumus pengindeksan:

$$\text{IDThread} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

Rumus ini bertindak sebagai sistem koordinat, menghubungkan data logis perangkat lunak (larik) dengan hierarki fisik perangkat keras (blok dan thread).

3. Konfigurasi Eksekusi

Parameter <<<B, T>>> menentukan bentuk kisi. Ini menjamin Skalabilitas Transparan: kode Anda menjalankan logika yang sama, baik perangkat keras memiliki 2 SM atau 80 SM.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary role of the __global__ qualifier?

To define a function that runs on the CPU and is called by the GPU.

To mark a function as a kernel that is callable from the host and executes on the device.

To synchronize all threads across the entire GPU grid.

To allocate memory in the global memory space.

QUESTION 2

If blockIdx.x = 2, blockDim.x = 256, and threadIdx.x = 10, what is the global index?

266

512

522

778

QUESTION 3

What does 'Transparent Scalability' imply in CUDA?

The memory automatically scales with the size of the input array.

The same code can run on different GPUs with varying SM counts without modification.

Threads can see into the registers of other threads.

The kernel speed increases linearly with the clock speed of the CPU.

QUESTION 4

Why is the if (i < n) check necessary in a kernel?

To prevent the GPU from overheating.

To ensure threads do not access memory outside the valid array bounds.

To check if the kernel is running on the correct SM.

To synchronize memory access between threads.

QUESTION 5

Which variable represents the number of threads within a single block?

gridDim.x

blockIdx.x

blockDim.x

threadIdx.x

1. Rancangan __global__

2. Resolver Alamat Global

3. Konfigurasi Eksekusi

1. Rancangan global